Find in Ordered Set (Practice Interview Question)

You're out of free questions.

Suppose we had a list ↴

Quick reference

	Worst Case
space	$O(n)$
lookup	$O(1)$
append	$O(1)$
insert	$O(n)$
delete	$O(n)$

An array organizes items sequentially, one after another in memory.

Each position in the array has an index, starting at 0.

Strengths:

Fast lookups. Retrieving the element at a given index takes $O(1)$ time, regardless of the length of the array.
Fast appends. Adding a new element at the end of the array takes $O(1)$ time, if the array has space.

Weaknesses:

Fixed size. You need to specify how many elements you're going to store in your array ahead of time. (Unless you're using a fancy dynamic array.)
Costly inserts and deletes. You have to "scoot over" the other elements to fill in or close gaps, which takes worst-case $O(n)$ time.

In Python 3.6

Some languages (including Python) don't have these bare-bones arrays.

Here's what arrays look like in Java.

  // instantiate an array that holds 10 integers
int gasPrices[] = new int[10];

gasPrices[0] = 346;
gasPrices[1] = 360;
gasPrices[2] = 354;

Java

Inserting

If we want to insert something into an array, first we have to make space by "scooting over" everything starting at the index we're inserting into:

An array of letters. From top to bottom, the values in the array are A, B, C, E, F, and G. The letter D is being inserted at the position of E, and the letters E, F, and G are each shown "scooting over" one index up to make room.

In the worst case we're inserting into the 0th index in the array (prepending), so we have to "scoot over" everything in the array. That's $O(n)$ time.

Deleting

Array elements are stored adjacent to each other. So when we remove an element, we have to fill in the gap—"scooting over" all the elements that were after it:

Another array of letters. From top to bottom, the values in the array are A, B, C, Z, D, E, and F. The letter Z is being deleted, and the letters D, E, and F are each shown "scooting over" one index down to fill the space created by the deletion.

In the worst case we're deleting the 0th item in the array, so we have to "scoot over" everything else in the array. That's $O(n)$ time.

Why not just leave the gap? Because the quick lookup power of arrays depends on everything being sequential and uninterrupted. This lets us predict exactly how far from the start of the array the 138th or 9,203rd item is. If there are gaps, we can no longer predict exactly where each array item will be.

Data structures built on arrays

Arrays are the building blocks for lots of other, more complex data structures.

Don't want to specify the size of your array ahead of time? One option: use a dynamic array.

Want to look up items by something other than an index? Use a dictionary.

n

integers sorted in ascending order. How quickly could we check if a given integer is in the list?

Solution

Because the list is sorted, we can use binary search ↴

A binary search algorithm finds an item in a sorted list in $O(lg(n))$ time.

A brute force search would walk through the whole list, taking $O(n)$ time in the worst case.

Let's say we have a sorted list of numbers. To find a number with a binary search, we:

Start with the middle number: is it bigger or smaller than our target number? Since the list is sorted, this tells us if the target would be in the left half or the right half of our list.
We've effectively divided the problem in half. We can "rule out" the whole half of the list that we know doesn't contain the target number.
Repeat the same approach (of starting in the middle) on the new half-size problem. Then do it again and again, until we either find the number or "rule out" the whole set.

We can do this recursively, or iteratively. Here's an iterative version:

  def binary_search(target, nums):
    """See if target appears in nums"""

    # We think of floor_index and ceiling_index as "walls" around
    # the possible positions of our target, so by -1 below we mean
    # to start our wall "to the left" of the 0th index
    # (we *don't* mean "the last index")
    floor_index = -1
    ceiling_index = len(nums)

    # If there isn't at least 1 index between floor and ceiling,
    # we've run out of guesses and the number must not be present
    while floor_index + 1 < ceiling_index:

        # Find the index ~halfway between the floor and ceiling
        # We use integer division, so we'll never get a "half index"
        distance = ceiling_index - floor_index
        half_distance = distance // 2
        guess_index = floor_index + half_distance

        guess_value = nums[guess_index]

        if guess_value == target:
            return True

        if guess_value > target:
            # Target is to the left, so move ceiling to the left
            ceiling_index = guess_index
        else:
            # Target is to the right, so move floor to the right
            floor_index = guess_index

    return False

How did we know the time cost of binary search was $O(lg(n))$ ? The only non-constant part of our time cost is the number of times our while loop runs. Each step of our while loop cuts the range (dictated by floor_index and ceiling_index) in half, until our range has just one element left.

So the question is, "how many times must we divide our original list size ( $n$ ) in half until we get down to 1?"

n * \frac{1}{2} * \frac{1}{2} * \frac{1}{2} * \frac{1}{2} * ... = 1

How many $\frac{1}{2}$ 's are there? We don't know yet, but we can call that number $x$ :

n * (\frac{1}{2})^x = 1

Now we solve for $x$ :

n * \frac{1^x}{2^x} = 1

n * \frac{1}{2^x} = 1

\frac{n}{2^x} = 1

n = 2^x

Now to get the $x$ out of the exponent. How do we do that? Logarithms.

Recall that $\log_{10}{100}$ means, "what power must we raise 10 to, to get 100"? The answer is 2.

So in this case, if we take the $\log_{2}$ of both sides...

\log_{2}{n} = \log_{2}{2^x}

The right hand side asks, "what power must we raise $2$ to, to get $2^x$ ?" Well, that's just $x$ .

\log_{2}{n} = x

So there it is. The number of times we must divide $n$ in half to get down to 1 is $log_{2}n$ . So our total time cost is $O(lg(n))$

Careful: we can only use binary search if the input list is already sorted.

to find the item in

O(\lg{n})

time and

O(1)

additional space.

Do you have an answer?

Show answer

Yes, check answer! I give up!

Wanna review this one again later? Or do you feel like you got it all?

Mark as done Pin for review later